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The receiver operating characteristic (ROC) curve, the positive 
predictive value (PPV) curve and the negative predictive value (NPV) 
curve are three measures of performance for a continuous diagnostic 
biomarker. The ROC, PPV and NPV curves are often estimated em- 
Li^ . pirically to avoid assumptions about the distributional form of the 

r-^ ' biomarkers. Recently, there has been a push to incorporate group 

• I sequential methods into the design of diagnostic biomarker studies. 

»^H , A thorough understanding of the asymptotic properties of the sequen- 

ce ' tial empirical ROC, PPV and NPV curves will provide more flexibil- 

ity when designing group sequential diagnostic biomarker studies. In 
this paper, we derive asymptotic theory for the sequential empirical 
ROC, PPV and NPV curves under case-control sampling using se- 
quential empirical process theory. We show that the sequential empir- 
^ ' ical ROC, PPV and NPV curves converge to the sum of independent 

\l ' Kiefer processes and show how these results can be used to derive 

"^^ , asymptotic results for summaries of the sequential empirical ROC, 

^ ■ PPV and NPV curves. 

(<— ^ I 1. Introduction. Several recent papers have discussed the application of 

fSJ ■ group sequential methodology to diagnostic biomarker studies [Tang, Emer- 

son and Zhou (2008), Tang and Liu (2010), Pepe et al. (2009)]. Group se- 
quential study designs (i.e., study designs with multiple interim analyses) 
provide an opportunity to improve the efficiency of diagnostic biomarker 
studies by allowing studies to terminate early when the candidate marker 
^_' is clearly superior or inferior to established markers or historical levels of 

marker performance. Many group sequential methods assume the existence 
of a test statistic with an independent increments covariance structure [Jen- 
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nison and Turnbull (2000)]. A thorough understanding of the asymptotic 
properties of the sequential empirical ROC, PPV and NPV curves and, 
specifically, verifying that their summary measures have an independent in- 
crements covariance structure, would provide great flexibility when designing 
group sequential diagnostic biomarker studies. 

Diagnostic biomarkers are used to classify a patient as a case or a con- 
trol. A dichotomous biomarker results in either a positive test, indicating 
that the subject should be classified as a case, or a negative test, indi- 
cating that the subject should be classified as a control. Many biomarkers 
are measured on a continuous scale and a threshold must be defined in 
order to translate a continuous biomarker into a positive or negative test 
result. Let D be a Bernoulli random variable indicating disease status with 
prevalence p and let X be a biomarker value with conditional distribution 
F{x\D = 1) = Fo{x) and F{x\D = 0) = Fjj{x), where Fd{x) is the distri- 
bution function for the cases and Fj=){x) is the distribution function for the 
controls. Furthermore, we define F[x) = F£,{x) -|- (1 — p)Fj~,[x) to be the 
biomarker distribution function for the entire population. Without loss of 
generality, assume that larger biomarker values are more indicative of dis- 
ease. For a threshold c, a biomarker value X is translated into a positive 
test result if it is greater than c and a negative test result if it is less than 
or equal to c. 

The receiver operating characteristic (ROC) curve summarizes the clas- 
sification accuracy of a continuous diagnostic biomarker [Pepe (2003)] by 
reporting the true positive fraction (TPF) and the false positive fraction 
(FPF) for all possible cut-offs of the marker. For a threshold c, TPF(c) = 
P[X > c\D = 1] and FPF(c) = P[X > c\D = 0]. The ROC curve is defined as 

ROC(c)={(TPF(c),FPF(c)),cE (-00,00)} 

and can alternately be expressed as 

(1.1) ROC{t) = SD{S]j\t)), tG(0,l), 

where Sd{x) = 1 — Fjj^x) and ^^(x) = 1 — F[){x). ROC(t) can be interpreted 
as the TPF corresponding to a FPF oft. Alternately, one might be interested 
in the inverse of the ROC curve, 

(1.2) R0C-i(7;) = Sc){S^\v)), v G (0, 1). 

ROC~ (v) is indexed by the TPF and can be interpreted as the FPF corre- 
sponding to a TPF of V. 

The predictive accuracy of a dichotomous biomarker can be summarized 
by the positive predictive value (PPV) and negative predictive value (NPV). 
The PPV and NPV curves were proposed as an extension of PPV and NPV 
to continuous markers [Moskowitz and Pepe (2004), Zheng et al. (2008)]. 
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For a threshold c, PPV(c) = P[D = 1\X > c] and NPV(c) = P[D = 0\X < c]. 
The PPV and NPV curves are defined as PPV(c) and NPV(c) for all c G 
(— oo,cx)). In practice, PPV and NPV curves are indexed by a summary of 
the marker distribution rather than a generic threshold [Moskowitz and Pepe 
(2004), Zheng et al. (2008)]. In this paper, we consider the PPV and NPV 
curves indexed by the FPF and the percentile value in the entire population. 

The ROC, PPV and NPV curves are commonly estimated nonparamet- 
rically to avoid making assumptions about the form of F£,[x) and Fg(x). 
This is particularly important in the case of the ROC, PPV and NPV curves 
because we are often interested in regions of the curve that correspond to the 
tails of these distributions. For example, a biomarker must possess a high 
specificity in order to be clinically useful in a low disease risk population 
screening setting, which corresponds to the upper tail of the biomarker dis- 
tribution among controls. 

Our understanding of the empirical ROC curve is enhanced by knowledge 
of its asymptotic properties. Hsieh and Turnbull (1996) showed that the 
empirical ROC curve converges to the sum of two independent Brownian 
bridges. The asymptotic normality of summary measures of the empirical 
ROC curve, such as the area under the ROC curve or a point on the ROC 
curve, can be derived from their work. To our knowledge, no asymptotic 
theory is available for the empirical PPV and NPV curves. 

Tang, Emerson and Zhou (2008) showed that a family of weighted area 
under the ROC curve (wAUC) statistics has an independent increments co- 
variance structure. It would be beneficial to show that this assumption holds 
for a larger class of summaries of the ROC curve. In this paper, we develop 
asymptotic theory for the sequential empirical ROC, PPV and NPV curves. 
Our results allow us to develop distribution theory for other summaries of 
the ROC curve and to develop distribution theory for summaries of the PPV 
and NPV curves. 

2. Notation and definitions. Before beginning our discussion of the se- 
quential empirical ROC, PPV and NPV curves, we provide definitions of 
the sequential empirical estimates for the underlying distribution and quan- 
tile functions. Let X£)^i,X£)^2; ■ • ■ j-'^D.no be i.i.d. marker values for the 
cases with distribution function, Fd{x), and Xfj i,Xfj 2, . . ■ , Xfj „ - be i.i.d. 
marker values for the controls with distribution function, Ff){x). Further- 
more, let r£) and r^ refer to the proportion of case and controls, respectively, 
that are observed at a given time point. The sequential empirical estimate 
of Fd{x) is defined as 

0, 0<rD< , 



FD,rD {^) = ' 



. [ronD] ^ 



EllXr, i < x\, —00 <x < 00, — <rr, <1, 



[TDnD] ^ no 
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and the sequential empirical estimate of -F^, (t) is defined as 



Ko^^) 



., A;-l k 

^D,k,[rono] ' ^^ ' ' < * ^ ' 



[rDnnl [rDnnl 

l<k< [rDnD],0<t<l, 

where X£,_ij^^„^],Xo,2,[ri,no], • ■ • ,^D,[r-onD],[rDno] are the sequential order 
statistics of the biomarker values for the cases. The sequential empirical 
estimates of Sd{x) and S^ (t) are defined as SD^roi^) — ^~ ^D,rD{^) and 
^Dr (*) ~ ^D r (-*- ~ 0- The Sequential empirical estimates for the control 
population are defined in an analogous fashion. The sequential empirical 
estimates of Fd{x) and F[){x) lead to a natural definition of the sequential 
empirical estimates of F{x) and F~^{t), 

Fro,ri,{x) = pFD,ro(.x) + (1 " p)Fo^,^{x) 

and 

F~^\^{t)=mf{x:Fr^,rj,ix)>t}, 

where p is assumed to be known. F^j^^r^ (x) is a linear combination of Fo^m (x) 
and Ffj ^_ (x) and is therefore indexed by both vd, the proportion of cases 
observed at a given time point, and r^j, the proportion of controls observed 
at a given time point. 

Throughout this paper, we let 0<a<6<l, 0<c<l, < d < 1 and 
make the following assumptions: 

(Al) Fo{x) and F£){x) are continuous distribution functions with con- 
tinuous densities foix) and //)(x), respectively, 

(A2) /d(x)>0 for xG (sup{x : Fd(x) = 0},inf{x:Fz)(x) = 1}), 
(A3) /5(x)>0 for xG (supjx : F^jx) = 0},inf{x:F5(x) = 1}), 
(A4) ^ — )• A > as ud — ^ oo and Uf^ — )• oo, that is, the ratio of cases to 
controls converges to a constant that is greater than 0. 

The asymptotic results in Section 3 make use of the Kiefer process. The 
Kiefer process, K{t,r), is a two-dimensional, mean-zero Gaussian process 
with covariance 

CoviK{ti,ri),K{t2,r2)) = {h A ta - tit2)(ri A ra), 

where A represents the minimum. The Kiefer process behaves like a Brow- 
nian bridge in t and Brownian Motion in r. 

The remainder of this paper proceeds as follows. In Section 3, we de- 
velop asymptotic theory for the sequential empirical ROC, PPV and NPV 
curves. First, we generalize the work of Hsieh and Turnbull (1996) to the 
sequential empirical ROC curve by showing that the sequential empirical 
ROC curve converges to the sum of independent Kiefer processes. Next, 
we develop asymptotic theory for the sequential empirical PPV and NPV 
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curves indexed by the FPF by writing them as functions of the sequential 
empirical ROC curve. Finally, we follow the approach of Pyke and Shorack 
(1968) to develop asymptotic theory for the PPV and NPV curves indexed 
by the percentile value of the marker distribution. We validate our asymp- 
totic results by simulation in Section 4 and illustrate how they can be used 
to design group sequential diagnostic biomarker studies in Section 5. We 
conclude with a discussion in Section 6. 

3. Asymptotic results. 

3.1. The sequential empirical ROC curve. In this section, we provide 
asymptotic results for the sequential empirical ROC curve. Results for the 
inverse of the sequential empirical ROC curve are nearly identical; we di- 
rect the reader to an associated technical report for details [Koopmeiners 

and Feng (2010)]. The sequential empirical ROC curve, ROCr.^^r£,(i)i is de- 
fined by substituting the sequential empirical estimates of Soix) and Si){x) 
into (1.1), yielding 

and for ease of notation, we define 

Rro,rnii)=nD^^^[nDrD]{^OCro,rnii) " ROC(t)). 

The primary result in this section provides asymptotic theory for Rrj^^rf,{t)- 
By developing asymptotic theory for Rrj^^rp,{t)^ we are also able to develop 
asymptotic theory for functionals of Rrcr-f, (t) as a special case. Theorem 3.1 
establishes the convergence of Rro^roi^) to t^i^ sum of independent Kiefer 
processes. 

Theorem 3.1. Assume (A1)-(A4) hold and let '^^ f^i* be bounded 
on [a, b] . As ud — )• oo and nj-, — )• oo 

]^2\t,1u, 
J - I r I 1 / 

uniformly for t G [a, b], r^i £ [c, 1] and r^ G [d, 1] where Ki and K2 are in- 
dependent Kiefer processes. 



i?.,,.^(t)^,ifi(ROC(i),rz,) + Ai/2!:£(M^^y2(t,r^ 

^D \JDWn V*j)/ 



A proof of Theorem 3.1 can be found in the Appendix. Theorem 3.1 
generalizes the results of Hsieh and Turnbull (1996) to the sequential empir- 
ical ROC curve. The proof of Theorem 3.1 is similar to the proof found in 
Hsieh and Turnbull (1996) but our proof relies on the more powerful sequen- 
tial empirical process theory. Sequential empirical process theory generalizes 
asymptotic theory for the standard empirical process by introducing a pa- 
rameter for time. In doing so, asymptotic results for the sequential empirical 
process involve the Kiefer process. Using properties of the Kiefer process, we 
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are able to easily derive asymptotic results for summaries of the sequential 
empirical ROC curve and verify that the independent increments assump- 
tion holds in many cases. Furthermore, we can recover Hsieh and Turnbull's 
result as a special case of Theorem 3.1 by letting r^j and r^j both equal 1. 

Corollary 3.2. Assume (Al)-(A4) hold and let f ;„°i, be bounded 
on [a, b] . As no — t- oo and n^ — t- oo, 

i?i,i(t) ^, B,{ROC{t)) + X^' ( ^"^Ig-ij^JD B2{t) 
uniformly for t G [a, b] where Bi and B2 are independent Brownian bridges. 

Proof. Immediate from Theorem 3.1 and by noting that K(t,l) =rf 
B{t). D 

An advantage to studying the asymptotic behavior of the sequential em- 
pirical ROC curve at the process level, rather than a single point on the 
sequential empirical ROC curve, is that we are able to study the joint be- 
havior of multiple points on the ROC curve. Corollary 3.3 provides a normal 
approximation for a vector of points on the sequential empirical ROC curve. 

Corollary 3.3. Assume (Al)-(A4) hold and let f ;„-i, ^ be bounded 

on [a,b]. For ti,t-2,. . . ,tj £ (0,1), rD,i,rD,2, ■ ■ ■ ,rD,J & (0,1] and rQ^^,rQ2^ 
. . . , r^ J E (0, 1], a vector of arbitrary points on the sequential empirical ROC 

curve, (ROC^o j,r5i(ii),R0C,.o,2,ro,2(*2),---,R0Cro^,r'5^^(tj)), is approx- 
imately multivariate normal with 



ROC.,,,,.^ ^, (t,) ~ iV(ROC(t,), a|^^^^^ ^^^^ ^^^,^), J = 1, 2, . . . , J, 



where 



cr^ , . = '-^^ ^^^^ + 



ROC{tj){l-ROC{tj)) ^ f fD{S^\tj)) Y tj{l-t 
and 



ROC.^,.-B„(*^) norDj \fDiSiHtj))J uor^^j 



Cov[ROC,,,„,5^(tO,ROC,,,^,,^^(i,)] 

_ (r-D.i A rflj)(ROC(ti) A ROC(tj) - ROC(ti) ROC(tj)) 

nDrD,irD,j 
^ , fDiS^\ti))^ ffDiS^\tj))\ {rj^^, A rcj){tiAtj - t^tj) 



fniSiHt,))] \f5iSiHt,))J nr,rt,,rr). 

Proof. Immediate from Theorem 3.1. D 
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Corollary 3.3 provides the asymptotic covariance for two points at dif- 
ferent locations and different times on the sequential empirical ROC curve. 
This allows us to fully specificy the joint sequential distribution of multi- 
ple points on the ROC curve, which allows us to design group sequential 
diagnostic biomarker studies where multiple points on the ROC curve are 
treated as multiple endpoints of a group sequential study. For example, we 
might be interested in ROC(ti) and R0C(t2), where ti is chosen for high 
specificity to rule patients in for work up and t2 is chosen for high sensitivity 
to rule out patients for invasive work. 

Our interest in the sequential empirical ROC curve is motivated by the 
need to design group sequential diagnostic biomarker studies. Our ability 
to design group sequential diagnostic biomarker studies would be enhanced 
by showing that summaries of the sequential empirical ROC curve have an 
independent increments covariance structure. The simplest summary of the 
ROC curve is a point on the ROC curve, ROC(i). ROC(t) can be inter- 
preted as the sensitivity at a specificity of 1 — t. Corollary 3.4 shows that 
the sequential empirical estimator of ROC(t) is asymptotically normal and 
has independent increments when divided its variance. 

Corollary 3.4. Assume (A1)-(A4) hold and let ° „-°i ( be bounded 

JDWf) (*)) 

on [a,b]. Fort G (0, 1) and J stopping times, (ROCr^, ^^rg j^{t),ROCrij 2,r-s 2(*)' 
. . . ,ROCrQ j,rr, j(0); ^^ approximately multivariate normal with 

^^rn,.,r^At) ~ iV(ROC(t),a|^^^ ^^^^ ^^p, z = 1,2, . . . , J, 

and 

Cov[ROC,,,„,^ Jt) , ROC,,^^,,^ ^, (i)] 

where ai — ^ is defined as in Corollary 3.3. 

Proof. Immediate from Corollary 3.3. D 

Asymptotic theory for other summary measures of the ROC curve, such 
as the area under the curve or the partial area under the curve, can also 
be derived from Theorem 3.1. This illustrates the flexibility of Theorem 3.1. 
By developing distribution theory for the sequential empirical ROC curve, 
we are able to derive distribution theory for summaries of the ROC curve 
as a special case. 

3.2. The sequential empirical PPV and NPV curves indexed by the false 
positive fraction. In this section, we consider the sequential empirical PPV 
and NPV curves indexed by the false positive fraction, t. The PPV and NPV 
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curve indexed by the false positive fraction can be written as a function of 
the ROC curve and their asymptotic properties can be derived using the 
results from Section 3.1. Asymptotic results for the PPV and NPV curve 
indexed by the true positive fraction, v, can similarly be derived by writing 
the PPV and NPV curve as a function of the inverse of the ROC curve 
but are not presented in this paper. The interested reader is directed to 
Koopmeiners and Feng (2010) for details. 

The PPV and NPV curves indexed by the false positive fraction are de- 
fined as PPV(t) = P[D = 1\X > 5^H*)] and NPV(t) = P[D = 0\X < S^^{t)] 
for all t G (0, 1) and can be written as functions of the ROC curve as follows: 

ROC{t)p 



(3.1) PPV(t) 
and 

(3.2) NPV(t) 



ROC{t)p + t{l-p) 
and 

(l-t)(l-p) 



(l-ROC(t))p+(l-t)(l-p)- 

The sequential empirical estimators of PPV(f) and NPV(t) are defined be 
plugging the sequential empirical estimator of ROC(t) into (3.1) and (3.2), 
yielding 



PPVr,,r-5(i) 



ROC,^,,^(t)p 



R0C,^,,g(t)y9 + t(l-p) 

and 

(l-t)(l-p) 



NPV,^.,3(t) = ^ _ 

From this point forward, we only consider PPVr^,rr,(0 and note that results 
for NPVf.£i,r£,(0 are nearly identical. Again, for ease of notation, we define 

Prn,r^it)^n^D^'[nnrD]{RRyrn,r^{t)-FPVit)). 

We begin by using the results of Section 3.1 to derive asymptotic theory 
for Pr]j,r^{'t)- Theorem 3.5 establishes the convergence of Pro,rf,ii) to the 
sum of two independent Kiefer processes. 

Theorem 3.5. Assume (A1)-(A4) hold and let /^ -gi be bounded 

on [a, b] . As no — )• oo and nj-, — )■ oo 



nn,rs [t) ^d ^ (j^oC(t)p + t(l - p))2 



K,(ROC(t),..) + AV^^(^^^||^)K.(,^ 
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uniformly for t £ [a, b], r/j G [c, 1] and r^j G [d, 1] where Ki and K2 are in- 
dependent Kiefer processes. 

The proof of Theorem 3.5 rehes on writing Proyrrii^) ^^ ^ function 
p .,,, ROC,^,,g(t)p ROC(t)p 



.ROCr^^r^{t)p + t{l-p) ROC{t)p + t{l-p) 

X (ROC,^,,g(t) - ROC{t)y'Rr^,rs{t) 
and applying the results of Theorem 3.1. The first term converges to 

( t{l-p)p 



V(R0C(t)p + i(l-p))2, 

and Rru,r£,{t) converges to the sum of two independent Kiefer process by 
Theorem 3.1. A formal proof of Theorem 3.5 can be found in Koopmeiners 
and Feng (2010). 

Prom Theorem 3.5, we can prove analogous results to Corollaries 3.3 
and 3.4 for the sequential empirical PPV curve indexed by the FPF. Namely, 
that an arbitrary vector of points on the sequential empirical PPV curve fol- 
lows a multivariate normal distribution and the sequential empirical estimate 
of a point on the PPV curve is approximately normally distributed with an 
independent increments covariance structure. We leave the formal statement 
of these corollaries for the Appendix but present the form of the covariance 
between two arbitrary points on the sequential empirical PPV curve: 

Cov[PPV,,,,,,^^, (t,), PPV.,,^,.^,^, (t,)] 

ti{l-p)p \f tj{l-p)p 



iROC{U)p + tiil-p))^J\iROC{tj)p + tjil-p)y 

xCov[ROC,,_^,,^^(t,),ROC,,,^,,^^.(t,)]. 

PPV(t) is a function of ROC(i) and, therefore, distribution theory for a vec- 
tor of points on the PPV curve can also be derived using the delta method 
and Corollary 3.3. 

Asymptotic theory for the fixed-sample empirical PPV curve indexed by 
the FPF, which was previously unavailable, can be derived as a special case 
of Theorem 3.5 by letting r^ and r^ equal 1. The fixed-sample empirical 
PPV curve converges to the sum of independent Brownian bridges 



V(ROC(t)p-ft(l-p))^ 
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which aUows us to derive a normal approximation for the empirical estimate 
of a point on the PPV curve 

Fgv.,M~A^(pFV(0,( (^oc(/)p;t-P))0 ''^fee...'->)' 

where ai — - is defined as in Corollary 3.3. 
ROCi,i(t) ■' 

3.3. The sequential empirical PPV and NPV curves indexed by the per- 
centile value. Finally, we consider the PPV and NPV curves indexed by the 
proportion of the population that are classified as negative, u, and positive, 
1 — u. In this case, the PPV and NPV curves are defined as PPV(n) = P[D = 
1\X > F-i('u)] and NPV('u) = P[D = 0\X < F-^{u)] for all u G (0, 1). Under 
this indexing, the PPV curve can be written as 

(3.3) PPV(n) = ^^(^"'("^^^ 

1 — u 

and since the NPV curve can be written as 

(3.4) NPV(n) = ^^^^ + ^—^ PPV(u) , 

u u 

it suffices to study the PPV curve when considering estimation of the NPV 
curve. 

The sequential empirical estimator of PPV(ti) is found by substituting the 
sequential empirical estimators of S^ix) and F{x), along with the known 
value of p, into (3.3), 

(3.5) PPV.,,.5(n) = '^''^°^/_^;^"^ " , 

and the sequential empirical estimator of NPV(ti) is found by substituting 
the sequential empirical estimator of PPV(n) into (3.4), 

-^' — ^- u — o 1 — u -^ — "^-^ 

(3.6) NPV,^ ,,5 {u) = ^ + PPV,^ ,rr> {u) ■ 

u u 

Finally, we define, 

^r-,,r.sM =^D'^'[^DrD](PPV.,,.5(tx) -PPV(n)) 

and 

Nru,rj,{u) = n^^/'[nz3rz5](NPV,,,,3(n) - NPV(tx)) 

for mathematical convenience. We begin by developing distribution theory 
for Prj-,^rf,{u)- Theorem 3.6 establishes the convergence of the sequential 
empirical PPV curve to the sum of two independent Kiefer processes. 
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Theorem 3.6. Assume (A1)-(A4) hold and let '^?}p^iiypl be bounded 
on [a, b] . As no — t- oo and n^ — > oo 

, p(1-p)/d(-^~H^)) ^rp 1 

+ 1-n /(F-^H) "^^^^^^ ^ ^^' ^^ 

uniformly for u G [a,^], rp G [c, 1] and r^, G [d, 1] where Ki and K2 are 
independent Kiefer processes. 

The proof of Theorem 3.6 is comphcated by the fact that Sp^roi^) 
and F~^j.-{t) are correlated because Frj^^rgix) is a hnear combination 

of FD^roi^) ^^"^ ^Dr- (^)- -'^'^ Contrast, the sequential empirical ROC curve 
and the sequential empirical PPV curve indexed by the FPF are functionals 
of two independent sequential empirical estimators, SD^mi^) ^^^^ S^ ._{t), 

which makes it easier to show that Rrj^^r^{t) and Pru,r^{i) converge to 
the sum of independent Kiefer processes. To account for the correlation be- 
tween So^roi^) and F~^^_ (t), we follow the approach of Pyke and Shorack 
(1968), who prove a similar result for two correlated, fixed-sample empirical 
processes. The proof of Theorem 3.6 can be found in the Appendix. 

Theorem 3.6 also establishes asymptotic theory for the sequential empir- 
ical NPV curve because NPVr.Q^r£,(i) is a function of PPVr.^^rg(i)- Corol- 
lary 3.7 establishes the convergence of Nrj^^r^it) to the sum of two indepen- 
dent Kiefer processes. 

Corollary 3.7. Assume (A1)-(A4) hold and let '^f^p-iu)) ^^ bounded 
on [a, b] . As no — )• 00 and n^ — )■ 00 

u f[F ^(n)) ro 

uniformly for u G [a, 6], ro G [c, 1] and rjy G [d,l] where Ki and K2 are 
independent Kiefer processes. 

Corollary 3.7 is immediate from Theorem 3.6 by noting that 

1 — M 

As with the ROC curve and the PPV curve indexed by the FPF, Theorem 3.6 
and Corollary 3.7 allow us to develop distribution theory for summaries of 
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the PPV and NPV curve indexed by u. Distribution theory for a vector of 
points on the PPV or NPV curve is left for the Appendix but we choose 
to highhght the joint distribution of the sequential empirical estimate of 
a single point on the PPV or NPV curve. Corollary 3.8 establishes that 
the sequential empirical estimate of a point on the PPV or NPV curve is 
asymptotically normal and has independent increments when divided by its 
variance. 

Corollary 3.8. Assume (Al)-(A4) hold and let '^f/jf-iff) be bounded 
on [a,b]. For u G (0, 1) and J stopping times: 

(A) (PFYro,i,r£,j{u),PPyrD,2,ro^2('^), ■ ■ • > PPV.ri3,j,ro,j (^))> ^^ approxima- 
tely multivariate normal with 



PPV,,,,,,^^(n)~Ar(PPV(n),a|^ ), i = l,2,...,J, 



and 

Cov[PPYro..,rB^, {u),PPVrn,„rB , (^)] 



2 



:Var[PPV.,,^,,^,^.(n)]=ai^^^^^^^^^^^^^, r,<r,, 



where 

2 



'fe-v.„..„.r(^^('-))-v(»)(^-ppv,.,) 



1 



norDj 



-(^fi;f^)V--v,"„(H.ppv(„) 



1 



(B) (NPV,.3,,r-si('u),NPVro,2,rs2(^)'---'NPV^D,j,rB,jW), is approxima- 
tely multivariate normal with, 

NPV.,,,,.5 Jn) ~ N{NPY{u),a'^^^ ^_^^ ^^^p, i = 1, 2, . . . , J, 

and 

Cov[NPV,^,^,,^Jn),NPV,^^^,,^_^(n)] 
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where 



'^-- • ^ - (^^(^ - "') H"> - ^) <- -^<")' 



^ .2 



1 



1 



rinV 



D'D,j 



It is immediate from Theorem 3.6 and Corollary 3.7 that PPV^j-,, rF,(^) and 

NPVrD,rQ{''J') are asymptotically normal with an independent increments 
covariance structure. By noting that 



and 



Fd{F-\u)) = 1 - ^-^PPV(n) = -(1 - NPV(n)) 
P P 



Fr,{F~\u)) = 1 - —^{1 - PPV(n)) = ^NPV(n), 
1- p 1-p 



we can write the asymptotic variances of PPV^^^r ?,('") and NPVr-^^j-- (n) 
as functions of PPV(n) and NPV(ti), respectively. This provides a better 
understanding of the mean-variance relationship for the asymptotic distri- 
butions of PPVro,r^iu) and NPYj-o^r^iiu) and, perhaps, provides a form of 
the variance that is easier to work with in practical situations (i.e., study 
design, estimating the standard error, etc.). 

An important component of Theorem 3.6 and Corollary 3.7 is that not 
only do Pro,ru{u) and Nrj^^rgiu) converge to the sum of independent Kiefer 
processes, but they both converge to the same two Kiefer processes. As 
a result, we are able to derive the correlation between a point on the PPV 
curve and a point on the NPV curve. Corollary 3.9 provides a bivariate 
normal approximation for a point on the PPV and a point on the NPV curve. 

Corollary 3.9. Assume (Al)-(A4) hold and let •^»^-i£^l^ be bounded 

on [a,b]. For ni,U2 G (0,1), (PFV r d, i, r q_^{ui), NPV r 0,2,^5, 2 (-'^'^))> "^^ approx- 
imately bivariate normally distributed with 

^^ru..,r^M)r.N{PPY{u),al~^ _ ^ 



-I3,l''-S,l("l) 



and 



NPV.,,,,,,,(.,)~iV(NPV(n),4^^^^^^^^^^^^^^. 
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with 

(1 - p)V(l - U2) fD{F-\m)) fr>{F~Hu2)) 
(l-ni)u2 f{F~^{ui)) f{F~^U2)) 

(rp,! A rg,2)(l - NPV(m)) PPV(n2) 

nDrD,irD,2 
p^ni(l - U2) fpjF-Hui)) fD{F-\u2)) 
{l-ui)u2 f{F-^{ui)) f{F~^U2)) 
(^5,2 A r^ o) NPV(ni)(l - PPV(n2)) 



+ 



nDrD,irD,2 
when ui < U2 and 

Cov[PPV,,_„,^Jni),NPV,,„,^^(u2)] 

,2fDiF^Hui))fDiF-Hu2)) 



[I-P) 



+ p- 



/(F-i^)) f{F~\u2)) 

rp,! A rg,2)(l - NPV(n2)) PPV(m) 

nDrD,irD,2 

2fD{F~\ui))fD{F-\u2)) 



(rn2Arn2)NPV(n2)(l-PPV(ni)) 



nDrD,irD,2 
when ti2 < ui , where o"? — - and cr? — are defined as in 

Corollary 3.8. 

The case of a point on the PPV curve and a point on the NPV curve is 
presented for simphcity but Corollary 3.9 can be extended to an arbitrary 
vector of points on the PPV and NPV curves. Corollary 3.9 has obvious 
practical implications. It is not uncommon to classify the bottom ui x 100% 
of the population as "low-risk," the top (1 — U2) x 100% of the population 
as "high-risk" and the remainder of the population as "moderate-risk." In 
this case, one would be interested in the NPV of the low-risk group and the 
PPV of the high-risk group. Corollary 3.9 provides the joint convergence of 
these two estimates. 

Finally, we note that asymptotic results for the fixed-sample empirical 
PPV and NPV curves indexed by the percentile value of the marker dis- 
tribution can be derived as a special case of the results in this section. It 
is immediate from Theorem 3.6 and Corollary 3.7 that the fixed-sample 
empirical PPV and NPV curves converge to the sum of independent Brow- 
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u o 

> 



0.0 0.2 0.4 0.6 0.8 1.0 

t 



0.0 0.2 0.4 0.6 0.8 1.0 

t 



0.0 0.2 0.4 0.6 0.8 1.0 



Fig. 1. True ROC and PPV curves for the scenario considered in Section 4- 



nian bridges by letting r^ and r/) both equal 1. Furthermore, Corollary 3.8 
provides a normal approximation for the fixed-sample empirical estimate of 
a point on the PPV or NPV curve for the special case when J = 1. 

4. Finite sample properties. A simulation study was completed to as- 
sess the finite sample properties of the results in Theorems 3.1, 3.5 and 3.6. 
We simulated 10,000 studies with n^ controls and no cases. Biomarker val- 
ues for the controls were drawn from a standard normal distribution and 
biomarker values for the cases were drawn from a normal distribution with 
mean and standard deviation equal to 1. A prevalence of 0.2 was used for esti- 
mation of the PPV curve. Figure 1 presents the true ROC and PPV curves 
for this scenario. For each realization, we calculated Rrj^^r^it), PrD,r£,{t) 
and Pro,rQi'U') and evaluated the expected value, normality and covariance 
for various combinations of r/j, r^ and t or u. Normality was evaluated by 
providing a summary of information found in a normal q-q plot. Instead of 
providing the entire plot, we provide the (simulated) probability of being less 
than the 5th, 25th, 50th, 75th and 95th percentile of a normal distribution 
with variance derived using the results in Theorems 3.1, 3.5 and 3.6. Sim- 
ilarly, the simulated covariance matrices were compared to the theoretical 
covariance matrices derived using the results in Theorems 3.1, 3.5 and 3.6. 

Table 1 presents simulation results for Rj-j^^r^it)- The expected value was 
close to in all cases with only a small amount of bias observed when t = 0.2. 
The probability of being less than the theoretical 5th and 95th percentile 
was close to the nominal value for all sample sizes, while the probability of 
being less than the 25th, 50th and 75th percentile was less than the nominal 
value with 50 cases and 50 controls but approached the correct values as 
sample size increases. The observed variance and covariance were less than 
expected with 50 cases and 50 controls but the observed covariance matrix 
approached the theoretical covariance matrix in larger sample sizes. This 
phenomenon is likely due to the sample space for ROC(t) being restricted to 

the unit interval. ROC(t) is less likely to equal or 1 as sample size increases 
and the normal approximation will be more accurate. Similar results were 
observed for Pro,r^{t) and PrD,r^iu) but were omitted for brevity. 



Table 1 

Simulation results to evaluate the finite sample properties of Theorem 3.1. Presented are the expected value, simulated probability of 

being less than 5th, 25th, 50th, 75th and 95th percentile of the normal distribution, the simulated covariance matrix and the theoretical 

covariance matrix for Rr^.r^it). 10,000 simulations were performed for each scenario 

5th 25th SOtii 75th 95th Observed Theoretical 

Mean %tile %tile %tile %tile %tile covariance matrix covariance matrix 

















no 


= 50, 


no = 


50 














O 
O 


Ro. 


4,0.7(0.4) 


0.01 


0.05 


0.17 


0.46 


0.63 


0.98 




0.1 


0.117 


0.079 


0.103 


0.104 


0.129 


0.081 


0.104 


Ri, 


1(0.4) 


0.02 


0.07 


0.2 


0.44 


0.74 


0.97 






0.318 


0.104 


0.262 




0.322 


0.104 


0.26 




i?o. 


4,0.7(0.2) 


0.03 


0.04 


0.22 


0.47 


0.73 


0.96 








0.161 


0.201 






0.171 


0.225 


^1, 


1(0.2) 


0.05 


0.04 


0.2 


0.47 


0.68 


0.93 

no = 


:100, 


■n-D = 


100 




0.544 








0.563 




i?0.4,0.7(0.4) 


0.01 


0.05 


0.21 


0.41 


0.78 


0.97 




0.101 


0.12 


0.08 


0.102 


0.104 


0.129 


0.081 


0.104 


> 

•2, 


^1, 


1(0.4) 


0.02 


0.05 


0.24 


0.48 


0.76 


0.96 






0.318 


0.104 


0.26 




0.322 


0.104 


0.26 


D 


Ro. 


4,0.7(0.2) 


0.03 


0.04 


0.2 


0.45 


0.73 


0.95 








0.164 


0.205 






0.171 


0.225 


N 


Ri. 


1(0.2) 


0.05 


0.04 


0.23 


0.47 


0.73 


0.95 

no = 


:200, 


Uf, = 


200 




0.55 








0.563 


•z 


^0.4.0.7(0.4) 


0.01 


0.06 


0.22 


0.44 


0.7 


0.96 




0.104 


0.121 


0.081 


0.102 


0.104 


0.129 


0.081 


0.104 




^1, 


i(0.4) 


0.02 


0.05 


0.25 


0.48 


0.72 


0.95 






0.317 


0.104 


0.259 




0.322 


0.104 


0.26 




Ro. 


4,0.7(0.2) 


0.03 


0.04 


0.25 


0.5 


0.7 


0.94 








0.168 


0.212 






0.171 


0.225 




Ri, 


i(0.2) 


0.05 


0.05 


0.23 


0.46 


0.72 


0.95 










0.555 








0.563 
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5. Application. The results of Section 3 provide fundamental theory that 
allows existing group sequential methodology to be applied to summaries 
of the ROC, PPV and NPV curves. In this section, we present an exam- 
ple of how these results can be used to design group sequential diagnostic 
biomarker studies. Our application is presented in the context of a study to 
evaluate the diagnostic accuracy of des-gamma carboxyprothrombin (DCP), 
a novel biomarker for the early detection of hepatocellular carcinoma (HOC). 
A multi-center study was completed to compare the diagnostic accuracy of 
DCP to that of alpha- fetoprotein (AFP), the most widely used biomarker 
for the detection of HCC [Marrero et al. (2009)] but in our application we 
will only consider the design of a study to compare DCP to historical levels 
of diagnostic accuracy for AFP. 

We consider a study to evaluate the predictive accuracy of DCP using 
the following novel design that makes use of the joint asymptotic theory for 
the PPV and NPV curve derived in Section 3.8. Assume that the prevalence 
of HCC in the population of interest is 0.2. In this case, one might call 
the bottom 60% percent of biomarker values "negative," the top 10% of 
the biomarker values "positive" and refer the remaining subjects for further 
evaluation. Under this scenario, we would desire a high NPV for negative test 
results, NPV(0.6), and a high PPV for positive test results, PPV(0.9). The 
NPV(0.6) for AFP is 0.92 and the PPV(0.9) is 0.82. To determine if DCP 
improves on the predictive accuracy of AFP, we would test the hypothesis, 

Hq: NPV(0.6) < 0.9 or PPV(0.9) < 0.8 

versus 

Ha. NPV(0.6) > 0.9 and PPV(0.9) > 0.8 

using the test statistics, ^npv(ui) and ^ppv(u2)5 where ^NPV(ni) is defined 
as 

_ NPV(0.6) - NPV(0.6)o 

^NPV(ni) — 5 

0'NPV(0.6)o 

and Zppv(M2) is defined in an analogous fashion. 

We consider a group sequential design using the error spending approach 
proposed by Hwang, Shih and De Cani (1990). The overall null hypothesis 
will only be rejected if the null hypotheses for both NPV(0.6) and PPV(0.9) 
are rejected. In the context of a group sequential study, this means that the 
study will stop early to reject the null hypothesis if ^npv{ui) ^.nd Zppv(u2) 
both cross the boundary for rejecting the null hypothesis but the study will 
stop early for futility if either .^npv(ui) or .^ppv(«2) cross the futility bound- 
ary. This implies that we do not need to adjust the type-I error rate to 
account for multiple endpoints but we do need to consider the joint proba- 
bility of rejecting the null hypothesis when determining the power. 
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Table 2 

Simulation results to evaluate the operating characteristics of a study to evaluate the 

predictive accuracy of DCP using a fixed-sample design and group sequential designs with 

two, three or four stopping times. Presented are the probability of rejecting the null 

hypothesis and expected sample size under the null and alternative hypotheses. 10,000 

simulations were performed for each scenario 

NPV(0.6) = 0.90 NPV(0.6) = 0.95 NPV(0.6) = 0.90 NPV(0.6) = 0.95 

PPV(0.9) = 0.80 PPV(0.9) = 0.80 PPV(0.9) = 0.90 PPV(0.9) = 0.90 
Stopping 

times P(reject) Eino) P(reject) E{nu) P(reject) E{nu) P(reject) Eino) 



J = l 


0.003 


702 


0.03 


702 


0.026 


702 


0.917 


702 


J = 2 


0.004 


432 


0.026 


492.4 


0.024 


489.5 


0.924 


624.5 


J = 3 


0.004 


367.4 


0.022 


431.3 


0.023 


433 


0.917 


580.1 


J = 4 


0.002 


340 


0.023 


410.7 


0.024 


417.2 


0.911 


571.1 



The sample size for our study is chosen to achieve 90% power under the 
ahernative hypothesis NPV(0.6) = 0.95 and PPV(0.9) = 0.90. A closed-form 
formula for determining the required sample size is not available. Instead, 
the sample size for a fixed sample design is derived by numerically solving 

^(^NPVK) > ^i-a/2,PPV(n2) > Zi_,/2|NPV(ni) = 0.95, PPV(n2) = 0.90) 

for ud, where the joint distribution of Znpv(ui) and Z-p-pY[u2) is derived 
by applying the delta method to the joint asymptotic normal distribution 
of NPVr-^^r£,(^i) and PPVr£,,r£,(w2) fouud in Corollary 3.9. Assuming a one- 
to-one ratio of cases to controls, 702 cases are required to achieve 90% power 
under the alternative hypothesis. This sample size must be multiplied by an 
infiation factor to determine the maximum sample size for a group sequential 
design (i.e., the sample size if the study does not stop at the interim analyses) 
in order for the group sequential design to maintain the same type-I error 
rate and power as the fixed-sample design [Jennison and Turnbull (2000)]. 
Using the gsDesign package in R, we find that the maximum sample size for 
group sequential studies with two, three and four stopping times are 724, 
737 and 745 cases, respectively. However, as illustrated in the simulation 
which follows, the actual sample sizes required in group sequential studies 
are generally smaller than these maximum values. 

Table 2 presents simulation results using a fixed-sample design and group 
sequential designs with two, three and four stopping times. Biomarker values 
for the controls were simulated from a standard normal distribution and 
biomarker values for the cases were simulated from a normal distribution 
with mean and variance chosen to achieve the desired value of NPV(0.6) 
and PPV(0.9). The advantages of group sequential designs are clear. The 
group sequential designs have similar type-I error rate and power to the 
fixed-sample design but with substantially smaller expected sample sizes in 
all scenarios. 
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6. Discussion. In this paper, we derived asymptotic properties of the 
sequential empirical ROC, PPV and NPV curves. We first extended the 
work of Hsieh and Turnbull (1996) to the sequential empirical ROC curve 
and used these results to develop distribution theory for summaries of the 
sequential empirical ROC curve. Next, we considered asymptotic theory for 
the sequential empirical PPV curve indexed by the FPF and percentile value 
in the entire population. These results were used to develop distribution 
theory for summaries of the sequential empirical PPV curve. Asymptotic 
theory for the fixed-sample PPV curve, which was previously unavailable, 
was developed as a special case. 

This work was motivated by the desire to design group sequential diag- 
nostic biomarker studies. In Section 5, we illustrated how our results can be 
used to design group sequential diagnostic biomarker studies. Our simula- 
tion results clearly illustrate the advantages of group sequential designs. In 
both cases, the group sequential designs have similar type-I error rate and 
power than the fixed-sample designs but with substantially smaller expected 
sample size. 

An advantage to our approach is that we are able to investigate the joint 
behavior of multiple points on the ROC and PPV curve. The primary end- 
point of a diagnostic biomarker study may be a single point on the ROC 
or PPV curve but other points on the ROC or PPV curve may also be of 
interest. The results of Theorems 3.1, 3.5 and 3.6 allow us to apply existing 
group sequential methodology for analyzing multiple endpoints to scenarios 
where multiple points on the ROC or PPV curve are of interest in a group 
sequential diagnostic biomarker study [Liu and Hall (2001)]. 

We considered estimation of the sequential empirical ROC and PPV curve 
under case-control sampling. The asymptotic properties of the sequential 
empirical ROC and PPV curve under other sampling schemes are also of 
interest. We are currently working on extending the results of this paper to 
estimation of the sequential empirical ROC and PPV curve under cohort 
and nested case-control sampling. 

The theory developed in this paper applies to sequential testing of the 
diagnostic accuracy of a continuous test. In many cases, diagnostic tests take 
the form of multi- level ordinal data (cancer staging, for example) . Methods 
exist extending the ROC curve to ordinal data [Dorfman and Alf (I960)] 
but further work is needed to verify that group sequential methods can be 
applied in these settings. 

Response adaptive clinical trials have been proposed as a means to provide 
greater flexibility when designing therapeutic clinical trials. Response adap- 
tive clinical trials adjust the design characteristics of the study (sample size, 
percent randomized to each group, etc.) in response to outcomes for subjects 
enrolled earlier in the study. Recently, Zhu and Hu (2010) showed that a class 



20 J. S. KOOPMEINERS AND Z. FENG 

of test statistics from a response adaptive clinical trial converges to Brow- 
nian Motion when considered sequentially (similar to what we have shown 
for the emprical ROC, PPV and NPV curves), which allows existing group 
sequential methodology to be applied to response adaptive clinical trials. 
Future work will be needed to consider how response adaptive designs can 
be applied in the setting of group sequential diagnostic biomarker studies. 

APPENDIX: SUPPLEMENTARY RESULTS FOR SECTION 3 
A.l. Supplementary results for Section 3.1. 

Proof of Theorem 3.1. First, note that 
n^'^'[nDrD](R0C,^,,5(i) - ROC(t)) 

= nl''\DrD]{SD,ro{Sll^{t)) - SD{S-^\t))) 

The first term converges to a Kiefer process. We note that 
sup sup sup \F^{F^i{t)) - 1\ 

c<TD<ld<rQ<la<t<h ' ^ 

= T^, sup sup sup l^^|F5(Fri (t))-i| 

Fd"J c<ro<ld<r^<la<t<fe "-D '° 

<^ sup sup supl^^^^|F5(Fri (t))-t|. 

Therefore, 

(A.l) sup sup sup |FD(F^Mt))-t|^a.s. 

c<rr,<ld<r£,<la<t<b ' ^ 

by the Glivenko-Cantelli theorems [Theorems 1.51 and 1.52 in Csorgo and 
Szyszkowicz (1998)] and because j^^ — 5- ^. Furthermore, F^ (t) will be 

continuous by (A1)-(A3) and will be uniformly continuous on [a,b]. There- 
fore, 

(A.2) sup sup sup |F^i (t)-i7-i(t)|^^^ 0. 

c<rD<ld<r£i<la<t<b ' ^ 

We note that due to the continuity of Ff){x), S^ (t) = F^ (1 — t) and 

therefore (A.2) also applies to S^ (t). From Corollary l.A in Csorgo and 
Szyszkowicz (1998), (A.2) and the uniform continuity of the Kiefer process, 
we have 

(A.3) n-'/^[nDrD]{SD,rniSi\At))-SD{S^\_{t)))^dK^{ROC{t),rD). 
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The second term can be rewritten as 

-1/2 






n 



-1/2 



D 



^0^31 



SDiSn\(t))-t 



xn-'^'[n^rQ]{S£,{S],\(t))-SnrJS~'(t))) 



D,r 



D^ D,rf 



+ 



n-'^DTD] (SDis^Hscis^lAtm - SDis^Hm 



n 



-1/2 



D 



Hnr 



D' D\ 



Sr){Si\_{t))-t 



-l/2r 



xn^-'-|n5r5](55^,^(S^i (t))-t). 



D 



By the mean value theorem, there exists a 5'£)(5£^ (t)) between 5£)(S'^ {t)) 
and t such that 

From (A.l), we know that S£){S~^ (t)) — J-a.s. t, uniformly for t G [a, b], rjj ^ 
[c, 1] and r^ G [d, 1], and, therefore, Sq{S^ (t)) — ^-a.s. t, uniformly for t G 
[a,b], td £ [c, 1] and r^ G [d, 1]. This, along with the uniform continuity of 

fD{SJ,\t)) 



allows us to conclude that 



sup sup sup 

c<r£,<l d<r^<l a<t<b 

which implies 



fD{s^\s^{s~^l^m) fD{s-^\t)) 



fD{SiHsD{s-^\ (tm Ms^\t)) 



..s. 0, 



sup sup sup 



SD{S],'{Sn{S],\(tm - SD{S],\t)) 



D,r 



D 



SDiS],\(t))-t 



(A.4) 



For ah rQ G [d,l], 



fD{Si,\t)) 



fDiS]s\t)) 



0. 



sup {§£, (S^\_ (t)) - t\ <a.s. 7 



1 



,™s^sJ 
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Therefore, 

sup sup sup rry'^[n[)r^]\S[) {S^\_{t)) - 1\ <a.s. -^h 

c<rD<ld<rQ<la<t<b ^ '^ n' 

and 

(A.5) sup sup s\\p n~^^'^[nQr^]\S^ (S^\_{t))-t\^.^,^,Qi. 

c<ro<ld<r£,<la<t<b ' ^ 

From Corollary l.A in Csorgo and Szyszkowicz (1998), (A. 2) and the uniform 
continuity of the Kiefer process, we have 

(A.6) n5^/'[n5r5](5s(5Ti^,Jt))-55,,^(5ri^^(t)))^,K2(i,r5). 

By (A.4), (A.5), (A.6) and noting that "n| ["o^'ol ^^i/2rg^ ^^ conclude 

that 

n~'%DrD]{SD{S^\.At)) - SD{S^\t))) 

Summing (A. 3) and (A. 7) gives the desired result. D 
A.2. Supplementary results for Section 3.2. 

Corollary A.l. Assume (A1)-(A4) hold and let f ,„-i, be bounded 

on [a,b]. For ti,t-2,. . . ,tj £ (0,1), rD,i,rD,2, ■ ■ ■ ,rD,j £ (0,1] and r5i,r5 2, 
. . . , r^ J E (0, 1], a vector of arbitrary points on the sequential empirical PPV 

curve, (PPVro^,^^^(ti), PPVr^ 2,^5,2 (*2),---,PPV^^^,r^^(ij)), is approxi- 
mately multivariate normal with 



PPV.,,.-z,.(*i)~^(PPV(t,),a|^ ), j = l,2,...,J, 



al^. , .= ,^^'\": ^ al 



( t{l-p)p 

'PP^,.B,..'-s,/*^) V(ROC(t)p + t(l-p))V "ROC.^^,,.g^.(i,) 
and 

Cov[PPV,,,„,^^ (t,), PPV.,,,.5 ^, (t,)] 

ti{l-p)p \( tj{l-p)p 



(ROC(ti)p + i^(l - P)) VV (ROC(t,)p + tj(l - p))2 
X Cov[ROC,,,„,^ ^ (i,), ROC,,,^,,^ ^, (i,)], 
where ai — ^ and Cov^ROCm rf^ (ti),iiOCrr, tf. (ti)] are as de- 

ROCro,j.,rg .(tj) ^ '^B.^.^D.iV «^' '^D,j .^^ j W /J 

/inerf in Corollary 3.3. 
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Proof. Immediate from Theorem 3.5. D 

Corollary A. 2. Assume (Al)-(A4) hold and let ° -gi : be bounded 

on [a,b]. Fort G (0, 1) and J stopping times (PPV^^ ^^r^, i(*))PPVr^ ^.r^, jl*)' 
. . . ,PPVr.^ j^r-f, jit))) ^■^ approximately multivariate normal with 
~ 2 






and 



where u? — is defined as in Corollary A.l. 

PPV (^-^ 

Proof. Immediate from Corollary A.l. D 

A.3. Supplementary results for Section 3.3. 

Proof of Theorem 3.6. The proof of Theorem 3.6 follows the proofs 
found in Pyke and Shorack (1968). First, note that 

n~D'\DrD]{SD,rAF^^,rr,i^)) - Sd{F-\u))) 

= n~^^'\nDrD]{FD{F-\u)) - ^^(^-^^(n))) 

+ nl^'^ [norD] [Fd (F" Vs (^)) " Fn^ru (^^r^Vs (^))) • 
The first term can be rewritten as 

n~^^'\nDrD]{FD{F-\u)) - Fn{F-^\^{u))) 

_ Fo{F~HF{Frtrn(^))))-FD{F~Hu)) 
F{F^,\r^{u))-u 

X ri^^/^[nDrz5](n - F,.^,,.^(F,-i_,,^(m))) 

Fd{F-\F{F-^\.^{u)))) - Fn{F-\u)) 



+ 



F{Fr~^\r^{u))-U 

pn-~r)'^'[nnrD]iFD,rn{F-^\r^{n)) - Fd{F-^,.^{u))) 
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I nj^'\nnrn\ Fd{F~\F{F~^^,_^{u)))) - Fd{F~\u)) 
nt%DrD\ F{Fr-^\r^{u))-u 

X (1 - p)n-^'^^[ni^rc]{Fc^,^iFi^\^iu)) - Fj:,iF~^\^{u))). 
We begin by showing that F{Fr^j._ (n)) converges to u uniformly, 
sup sup sup \F{F~^^j,^{u))-u\ 

c<?'D<l d<r£)<l a<u<b 

< sup sup sup \F{F-^.^^{u)) - Frj,^r^,{F,7^,r^iu))\ 

c<rD<l d<rQ<l a<u<b 

+ sup sup sup \Frj,,r^{F~j^,ro(^))-u\- 

c<r£)<l d<rjj<l a<u<b 

We note that 

sup sup SU.p\F{F-j^.,^{u))-Fr^^r^{Firj^,r^i:^))\ 

c<rD<l d<rjj<l a<u<b 

<^ sup sup sup I^^^^|F^(F-;,,^(n))-Fz),.,(F-V^(n))| 

[nDC\c<rD<ld<rs<la<u<b riD "' ° "' ° 

+^^ sup sup sup ^-^^^^\Fr,{F;^^ iu)) 

^a.s. 0, 

by the Ghvenko-Cantelh theorems [Theorems 1.51 and 1.52 in Csorgo and 

ng 



Szyszkowicz (1998)], along with the fact that r"° , — )■ - and t— ^ — ?• ^. For 
ah rz3,r5G (0,1] X (0,1], 

^supJ.-F.,,,(F-,,(n))| <... (i;:^ vi-^ 

Therefore, 

sup sup sup \u-Fro^ro{'Fro,ro'<^))\ 

r^vf^)^o, 

[n^c] [n^d]/ 
which implies that 
(A.8) sup sup sup \F{F-J-^^_{u))-u\^s..s.O. 

c<r£)<l d<rg<l a<u<b 

We note that (A.8) also implies that F£){F~Jj._ (n)) and Fg(F^"^^,^_ (n)) con- 
verge uniformly to F£){F~^(u)) and Ff){F~^{u)), respectively, which can be 
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seen by noting that the difference between Fd{F^^j._ [u)) and F£)[F ^{u)) 
will always have the same sign as the difference between Fj^{F~^^._{u)) 
andF5(F-i(u)). 

By the mean value theorem, there exists F{F~^j._{u)) between u 

and F{F~^^_ {u)), such that 

Fn{F-\F{F-^\^{um-Fn{F-\u)) _ MF-\F{F-^\^iu)))) 



FfcVs(^))-« 



f{F^^F{Fr~,\r^{um 



The uniform continuity of f(p-i( \) i combined with the fact that 

F(F,-i,^(n))^a.s.« 
uniformly, allows us to conclude 

fn{F-HF{F~^\^{um fn{F-\u)) 



(A. 9) sup sup sup 

c<rD<l d<rg<l a<u<b 

For all rD,r5G (0,1] x(0,l], 

-1/2 r 



/(F-i(FfcV5(n)))) f{F-^u)) 



0. 



sup n^ ' [nDroWu- Frj,^rQ{Frj^^r^,iu))\ 
a<u<b 



< 



a.s. \ _^/2 



P ^ [nprp ] 1 - p 



Fd^'dJ n 



-1/2 



D 



0. 



Therefore, as nu — t- oo and n/) — t- oo, 

sup sup sup rijj' [nDrD]\u-Fr^,,r^{F~^(u))\ 

0<rn<lO<r^<la<u<b 

Combining this result with (A. 9) allows us to conclude that 
FDiF~HFiF;;^\^{um - FnjF^Hu)) „i/. ^ ^_, 

F[Fro,r^iu))-U 
^a.s. 0. 

Corollary l.A in Csorgo and Szyszkowicz (1998), (A. 9) and the uniform 
continuity of the Kiefer process allow us to conclude 

Fn{F-\FiF-^\^{nm - FniF-^u)) 

F{F^,\r^{u))-u 

X pn-^'/^[nDrD]{FD,rjF-^%{u)) - Fd{F-^,.^{u))) 

fD{F-\u)) 



(A.IO) 



^d 



f{F-\u)) 



■pKi{FD{F~\u)),rD) 
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and 

n-o'\nj,rj,\ F:,{F~\F(F~^^,_^(u)))) - Fd{F~\u)) 



nl^%DrD\ F{Fr7^\ri,{u))-u 



(A.ll) X il-p)n'/^[nj:,ro]{Fo,.JF,-\An))-FoiF~\rAu))) 






The second term converges in distribution to a Kiefer process 

n-D'\nDrD]{FD{F-^,rM) " ^D,rAF-^,r^{u))) 

(A.12) = -nl'%DrD]{FD,r^{F-lr^{'^)) - FniF-^\r^{u))) 

^d-Ki{FD{F~\u)),rD) 

by Corollary l.A in Csorgo and Szyszkowicz (1998). Summing (A. 10), (A.ll) 
and (A.12) gives the desired result. D 

Corollary A. 3. Assume (Al)-(A4) hold and let '^f/jf-iff) be bounded 

on [a,b]. Forui,U2,...,uj£ (0,1), rD,i,rn,2,- ■ ■ ,rD,j & (0,1] and ri)^^,ri)2, 
. . . ,r£) J £ [0,1], a vector of arbitrary points on the sequential empirical PPV 

curve, (PPVro^,r-o,i('"i)>PPVr3,2,ro,2(^2),---,PPVr.^^^,ro,j(^j)); is approx- 
imately multivariate normal with 

PPV^z,.,rs.(%)~^(PPVK)'^^ .„,))' i = l,2,...,J, 



with 



Cov[PPV,,_„,^^^(ni),PPV,,„,^^(n2)] 

_ {l- pfu^ fr){F'\ui)) fr>{F-\u2)) 
(1-ui) /(F-i(ni)) f{F-\u2)) 
{rD,i A rD,2)(l - NPV(ni)) PPV(n2) 



+ 



nDrD,irD,2 
p'ui fpjF-^ui)) fD{F-Hu2)) 
(1-ui) /(F-i(ni)) f{F-^U2)) 
(^B,2 A r5_2) NPV(ni)(l - PPV(n2)) 



nDrD,irD,2 
when ui < U2 and 

Cov[PPV,,_„,^^ (m), PPV,, „,^^ («2)] 

_ {l-pfu2MF~Hui))fD{F-Hu2)) 
il-U2) f{F-\ui)) fiF-\u2)) 



+ 
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^^ (rz3,iArg,2)(l-NPV(n2))PPV(ni) 
nDrD,irD,2 
A2 fD{F-\ui)) fD{F-Hu2)) 
(I-U2) /(F-Hui)) fiF~\u2)) 
K2 A r5,2) NPV(n2)(l - PPV(ni)) 

V ■ ■ 

nDTD,lTD,2 

when U2 < ui, where o"? — is defined as in Corollary 3.8. 

Proof. Immediate from Theorem 3.6. D 
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